US Police Killings Analysis

Posted on Dim 23 septembre 2018 in Data Analysis

US Police Killings

The data set represent shootings of civilians by police in the US. It contains information on each police killing in the US from January 2015 to June 2015.

The goal is to investigates on these shootings.

In [2]:
import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)
Out[2]:
name age gender raceethnicity month day year streetaddress city state ... share_hispanic p_income h_income county_income comp_income county_bucket nat_bucket pov urate college
0 A'donte Washington 16 Male Black February 23 2015 Clearview Ln Millbrook AL ... 5.6 28375 51367.0 54766 0.937936 3.0 3.0 14.1 0.097686 0.168510
1 Aaron Rutledge 27 Male White April 2 2015 300 block Iris Park Dr Pineville LA ... 0.5 14678 27972.0 40930 0.683411 2.0 1.0 28.8 0.065724 0.111402
2 Aaron Siler 26 Male White March 14 2015 22nd Ave and 56th St Kenosha WI ... 16.8 25286 45365.0 54930 0.825869 2.0 3.0 14.6 0.166293 0.147312
3 Aaron Valdez 25 Male Hispanic/Latino March 11 2015 3000 Seminole Ave South Gate CA ... 98.8 17194 48295.0 55909 0.863814 3.0 3.0 11.7 0.124827 0.050133
4 Adam Jovicic 29 Male White March 19 2015 364 Hiwood Ave Munroe Falls OH ... 1.7 33954 68785.0 49669 1.384868 5.0 4.0 1.9 0.063550 0.403954

5 rows × 34 columns

In [3]:
police_killings.columns
Out[3]:
Index(['name', 'age', 'gender', 'raceethnicity', 'month', 'day', 'year',
       'streetaddress', 'city', 'state', 'latitude', 'longitude', 'state_fp',
       'county_fp', 'tract_ce', 'geo_id', 'county_id', 'namelsad',
       'lawenforcementagency', 'cause', 'armed', 'pop', 'share_white',
       'share_black', 'share_hispanic', 'p_income', 'h_income',
       'county_income', 'comp_income', 'county_bucket', 'nat_bucket', 'pov',
       'urate', 'college'],
      dtype='object')
In [4]:
count_race = police_killings["raceethnicity"].value_counts()
In [5]:
%matplotlib inline
import matplotlib.pyplot as plt

Shooting by Race

In [6]:
plt.bar(range(6), count_race.values)
plt.xticks(range(6), count_race.index, rotation="vertical")
plt.show()
In [7]:
count_race / sum(count_race)
Out[7]:
White                     0.505353
Black                     0.289079
Hispanic/Latino           0.143469
Unknown                   0.032120
Asian/Pacific Islander    0.021413
Native American           0.008565
Name: raceethnicity, dtype: float64

Shootings By Regional Income

In [8]:
income = police_killings["p_income"][police_killings["p_income"] != '-'].astype('int')
plt.hist(income,bins=30)
plt.show()
In [21]:
police_killings["p_income"][police_killings["p_income"] != '-'].astype('int').median()
Out[21]:
22348.0

According to the Census, median personal income in the US is 28,567, and our median is 22,348, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make conclusions.

Shootings By State

In [10]:
state_pop = pd.read_csv("state_population.csv")
In [11]:
counts = police_killings["state_fp"].value_counts()
#counts : Pandas Series, where the index is the code for each state, 
#and the values are the numbers of police killings in each state.
In [12]:
states = pd.DataFrame({"STATE": counts.index, "shootings": counts})
states = state_pop.merge(states, on = "STATE")
# STATE is the common column that both states and state_pop share.
In [13]:
states["pop_millions"] = states["POPESTIMATE2015"]/1000000
In [14]:
states["rate"] = states["shootings"]/states["pop_millions"]
In [15]:
states.sort("rate")
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:1: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....)
  if __name__ == '__main__':
Out[15]:
SUMLEV REGION DIVISION STATE NAME POPESTIMATE2015 POPEST18PLUS2015 PCNT_POPEST18PLUS shootings pop_millions rate
6 40 1 1 9 Connecticut 3590886 2826827 78.7 1 3.590886 0.278483
37 40 1 2 42 Pennsylvania 12802503 10112229 79.0 7 12.802503 0.546768
15 40 2 4 19 Iowa 3123899 2395103 76.7 2 3.123899 0.640226
32 40 1 2 36 New York 19795791 15584974 78.7 13 19.795791 0.656705
21 40 1 1 25 Massachusetts 6794422 5407335 79.6 5 6.794422 0.735898
29 40 1 1 33 New Hampshire 1330608 1066610 80.2 1 1.330608 0.751536
19 40 1 1 23 Maine 1329328 1072948 80.7 1 1.329328 0.752260
13 40 2 3 17 Illinois 12859995 9901322 77.0 11 12.859995 0.855366
34 40 2 3 39 Ohio 11613423 8984946 77.4 10 11.613423 0.861073
45 40 2 3 55 Wisconsin 5771337 4476711 77.6 5 5.771337 0.866350
22 40 2 3 26 Michigan 9922576 7715272 77.8 9 9.922576 0.907023
39 40 3 6 47 Tennessee 6600299 5102688 77.3 6 6.600299 0.909050
33 40 3 5 37 North Carolina 10042802 7752234 77.2 10 10.042802 0.995738
28 40 4 8 32 Nevada 2890845 2221681 76.9 3 2.890845 1.037759
42 40 3 5 51 Virginia 8382993 6512571 77.7 9 8.382993 1.073602
44 40 3 5 54 West Virginia 1844128 1464532 79.4 2 1.844128 1.084523
23 40 2 4 27 Minnesota 5489594 4205207 76.6 6 5.489594 1.092977
14 40 2 3 18 Indiana 6619680 5040224 76.1 8 6.619680 1.208518
30 40 1 2 34 New Jersey 8958013 6959192 77.7 11 8.958013 1.227951
3 40 3 7 5 Arkansas 2978204 2272904 76.3 4 2.978204 1.343091
9 40 3 5 12 Florida 20271272 16166143 79.7 29 20.271272 1.430596
8 40 3 5 11 District of Columbia 672228 554121 82.4 1 0.672228 1.487591
43 40 4 9 53 Washington 7170351 5558509 77.5 11 7.170351 1.534095
10 40 3 5 13 Georgia 10214860 7710688 75.5 16 10.214860 1.566346
17 40 3 6 21 Kentucky 4425092 3413425 77.1 7 4.425092 1.581888
25 40 2 4 29 Missouri 6083672 4692196 77.1 10 6.083672 1.643744
0 40 3 6 1 Alabama 4858979 3755483 77.3 8 4.858979 1.646436
20 40 3 5 24 Maryland 6006401 4658175 77.6 10 6.006401 1.664891
41 40 4 8 49 Utah 2995919 2083423 69.5 5 2.995919 1.668937
46 40 4 8 56 Wyoming 586107 447212 76.3 1 0.586107 1.706173
40 40 3 7 48 Texas 27469114 20257343 73.7 47 27.469114 1.711013
38 40 3 5 45 South Carolina 4896146 3804558 77.7 9 4.896146 1.838180
4 40 4 9 6 California 39144818 30023902 76.7 74 39.144818 1.890416
26 40 4 8 30 Montana 1032949 806529 78.1 2 1.032949 1.936204
36 40 4 9 41 Oregon 4028977 3166121 78.6 8 4.028977 1.985616
24 40 3 6 28 Mississippi 2992333 2265485 75.7 6 2.992333 2.005124
16 40 2 4 20 Kansas 2911641 2192084 75.3 6 2.911641 2.060694
7 40 3 5 10 Delaware 945934 741548 78.4 2 0.945934 2.114312
5 40 4 8 8 Colorado 5456574 4199509 77.0 12 5.456574 2.199182
18 40 3 7 22 Louisiana 4670724 3555911 76.1 11 4.670724 2.355095
31 40 4 8 35 New Mexico 2085109 1588201 76.2 5 2.085109 2.397956
12 40 4 8 16 Idaho 1654930 1222093 73.8 4 1.654930 2.417021
1 40 4 9 2 Alaska 738432 552166 74.8 2 0.738432 2.708442
11 40 4 9 15 Hawaii 1431603 1120770 78.3 4 1.431603 2.794071
27 40 2 4 31 Nebraska 1896190 1425853 75.2 6 1.896190 3.164240
2 40 4 8 4 Arizona 6828065 5205215 76.2 25 6.828065 3.661359
35 40 3 7 40 Oklahoma 3911338 2950017 75.4 22 3.911338 5.624674

States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.

State By State Differences

Dive more in the data in order to explain differerences in police killing rate.

In [36]:
pk = police_killings[(police_killings["share_white"] != "-")
                     & (police_killings["share_black"] != "-")
                     & (police_killings["share_hispanic"] != "-")]

pk["share_white"] = pk["share_white"].astype('float')
pk["share_black"] = pk["share_black"].astype('float')
pk["share_hispanic"] = pk["share_hispanic"].astype('float')
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:5: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:6: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [56]:
lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]

ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]

Mean of the Lowest Shooting Rate

In [65]:
ls[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].mean()
Out[65]:
pop                4201.660714
county_income     54830.839286
share_white          60.616071
share_black          21.257143
share_hispanic       12.948214
dtype: float64

Mean of the Highest Shooting Rate

In [66]:
hs[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].mean()
Out[66]:
pop                4315.750000
county_income     48706.967391
share_white          55.652174
share_black          11.532609
share_hispanic       20.693478
dtype: float64

It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.

States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.

In [67]:
hs[["pop", "county_income",
    "share_white", "share_black", "share_hispanic"]].describe()
Out[67]:
pop county_income share_white share_black share_hispanic
count 92.000000 92.000000 92.000000 92.000000 92.000000
mean 4315.750000 48706.967391 55.652174 11.532609 20.693478
std 2063.723609 9839.206872 24.406158 19.591303 20.415690
min 403.000000 25498.000000 2.100000 0.000000 0.000000
25% 2886.000000 42987.000000 39.175000 0.675000 4.350000
50% 4257.500000 48801.000000 58.200000 2.700000 10.850000
75% 5377.000000 53596.000000 74.200000 11.550000 31.725000
max 13561.000000 77454.000000 95.900000 93.100000 81.500000